-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add delete backup cmd using finalizer and simplify GC process #252
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass. Need to do another pass to review the actual gc changes
pkg/cmd/cli/delete/delete.go
Outdated
@@ -0,0 +1,41 @@ | |||
/* | |||
Copyright 2017 Heptio Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
project ark contribs
pkg/controller/backup_controller.go
Outdated
@@ -229,6 +229,9 @@ func (controller *backupController) processBackup(key string) error { | |||
// set backup version | |||
backup.Status.Version = backupVersion | |||
|
|||
// add GC finalizer | |||
backup.Finalizers = append(backup.Finalizers, gcFinalizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only add if it doesn't exist
pkg/controller/gc_controller.go
Outdated
var _ Interface = &gcController{} | ||
backupInformer.Informer().AddEventHandler( | ||
cache.ResourceEventHandlerFuncs{ | ||
UpdateFunc: func(old interface{}, new interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this a real function so we can unit test it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since new
is a reserved keyword, it might be better not to use it. Looking at the kube source, I've seen
func(oldObj, newObj interface{})
func(old, cur interface{})
I think I prefer the first form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also since you're not using old
, you can just do func(_, newObj interface{})
pkg/controller/gc_controller.go
Outdated
} | ||
log.Debugf("Backup has finalizers %s", backup.Finalizers) | ||
|
||
for x, finalizer := range backup.Finalizers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd extract this to a function that just returns a bool. No need for idx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then maybe make another helper that returns a new slice w/the finalizer removed. See e.g. https://github.com/kubernetes/kubernetes/blob/722cb7a758d4678e7ed5c4f09ded99d6bc5b0b4c/staging/src/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/helpers.go#L109-L118
pkg/controller/gc_controller.go
Outdated
return | ||
} | ||
|
||
// no more finalizers: we don't need to issue another delete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be "There are other finalizers, so we won't issue a delete as other controllers still need to process" along with a len >= 1 check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure on how this is supposed to work. I'm going off of https://kubernetes.io/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/#finalizers which says:
...Each controller then removes its finalizer from the list and issues the delete request again. This request only deletes the object if the list of finalizers is now empty, meaning all finalizers are done.
However, the behavior I was seeing was that if I issued a delete after removing the last/only finalizer, it would fail with object not found; the object appeared to be deleted once the last finalizer had been removed.
4e95706
to
9ccb14e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: CRD finalizers require kube 1.7.5 or newer.
pkg/cmd/cli/delete/delete.go
Outdated
backupCommand.Aliases = []string{"backups"} | ||
|
||
c.AddCommand( | ||
backupCommand, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wire in schedules and restores too? Or in a follow-up?
pkg/controller/gc_controller.go
Outdated
log.WithError(err).Error("Error marshaling finalizers patch") | ||
} | ||
|
||
upd, err := c.backupClient.Backups(api.DefaultNamespace).Patch(backup.Name, types.MergePatchType, patchBytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use backup.Namespace instead of api.DefaultNamespace so we don't have to come back later and change it (to support an arbitrary namespace for ark)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to pull in client-go's retry
package and use retry.RetryOnConflict
. Or we could just stop here on conflict and let the next run of the loop try again.
pkg/controller/gc_controller.go
Outdated
c.processBackups() | ||
} | ||
now := c.clock.Now() | ||
c.logger.Info("Garbage-collecting backups that have expired as of now") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove "that have expired as of now" 😄
pkg/controller/gc_controller.go
Outdated
for _, backup := range backups { | ||
log := c.logger.WithField("backup", kube.NamespaceAndName(backup)) | ||
if backup.Status.Expiration.Time.After(now) { | ||
log.Info("Backup has not expired yet, skipping") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we need this at the info level still?
pkg/controller/gc_controller.go
Outdated
// since backups have a finalizer, this will actually have the effect of setting a deletionTimestamp and calling | ||
// an update. The update will be handled by this controller and will result in a deletion of the obj storage | ||
// files and the API object. | ||
if err := c.backupClient.Backups(api.DefaultNamespace).Delete(backup.Name, &metav1.DeleteOptions{}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backup.Namespace
pkg/controller/gc_controller.go
Outdated
return | ||
} | ||
|
||
// no more finalizers: we don't need to issue another delete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked to api machinery: no need to issue additional deletes. The only action we need to perform is the patch to remove our finalizer.
} | ||
|
||
log.Infof("Garbage-collecting backup") | ||
if err := c.garbageCollect(backup, log); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this call errors once, it's entirely possible (likely even?) that it will continue to error the next time we try it for the same backup. It might be worth considering switching to a rate limited queue (like our other controllers), or keeping track of how many times we've tried to delete (via annotations) and stopping after some fixed limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this is PR-blocking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No
pkg/controller/gc_controller_test.go
Outdated
expectedObjectStorageDeletions: sets.NewString("backup-1"), | ||
}, | ||
{ | ||
name: "if snapshot deletion fails, don't delete kube backup and return error", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't deleting the kube backup from garbageCollect
any more
pkg/controller/gc_controller_test.go
Outdated
snapshots sets.String | ||
backupFiles sets.String | ||
backupMetadataFiles sets.String | ||
restores []*api.Restore | ||
nilSnapshotService bool | ||
expectedErr bool | ||
expectedRestoreDeletes []string | ||
expectedBackupDelete string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used any more by this test
pkg/controller/gc_controller_test.go
Outdated
snapshots sets.String | ||
backupFiles sets.String | ||
backupMetadataFiles sets.String | ||
restores []*api.Restore | ||
nilSnapshotService bool | ||
expectedErr bool | ||
expectedRestoreDeletes []string | ||
expectedBackupDelete string | ||
expectedSnapshots sets.String | ||
expectedObjectStorageDeletions sets.String | ||
}{ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may just want to convert this to a few test cases:
- nil snapshot server, backup has snapshots, return error (ensure that no DeleteSnapshot, DeleteBackupDir, restore Delete calls are issued (handled by mock.AssertExpectations etc))
- happy path
- do as much as you can (some DeleteSnapshot errors & successes, DeleteBackupDir error & success, some restore Delete errors & successes) and return error(s) encountered
Need unit test coverage on handleFinalizer |
pkg/controller/gc_controller.go
Outdated
upd, err := c.backupClient.Backups(api.DefaultNamespace).Patch(backup.Name, types.MergePatchType, patchBytes) | ||
if err != nil { | ||
if err := retry.RetryOnConflict(retry.DefaultBackoff, func() error { | ||
_, err = c.backupClient.Backups(backup.Namespace).Patch(backup.Name, types.MergePatchType, patchBytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to use RetryOnConflict, we have to regenerate the patchMap in the anonymous func, based on an updated copy of the backup (either from the lister, or using a real client to fetch the data from the apiserver). Otherwise we'll keep sending the same patch and getting the same conflict each time. That's why I was suggesting it might be easier to just let the next iteration handle it.
2361337
to
af71115
Compare
addressed everything except the rate-limited queue comment. |
This also fixes #77, doesn't it? |
pkg/controller/gc_controller.go
Outdated
} | ||
log.Debugf("Backup has finalizers %s", backup.Finalizers) | ||
|
||
// only process the GC finalizer when it's the first item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the kube code and can't find any place where it only acts on a finalizer if it's the first element in the slice. Let's go back to proceeding if the slice contains our finalizer, regardless of position. Sorry for the flip-flop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np, easy to switch back.
continue | ||
restoreLog.Info("Deleting restore referencing backup") | ||
if err := c.restoreClient.Restores(restore.Namespace).Delete(restore.Name, &metav1.DeleteOptions{}); err != nil { | ||
restoreLog.WithError(errors.WithStack(err)).Error("Error deleting restore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this should just be an error so we consider the GC of the backup a failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, in my opinion no -- I'm not too worried about orphaned restores (since they don't cost any cloud resources), plus that would have the result of leaving the backup API object around, despite its tarball/etc. potentially having been deleted, which could make a user believe it's still restorable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wfm
Re: #77 , it doesn't address it directly but does mean that Ark won't orphan files in obj storage. However, if a user manually deletes a backup API obj, or manually adds files to obj storage, Ark won't be able to clean them up. Actually, if a user manually deletes an API obj, it'll get re-synced so that case is fine. So the only case we don't handle is files manually added to the bucket outside of Ark. I don't think we need to handle that. |
af71115
to
54eb048
Compare
I'm pretty sure BackupService.DeleteBackupDir deletes everything |
everything within a backup dir, yes. I think we can consider #77 addressed and if some other related issue pops up we can create a new one. |
Signed-off-by: Steve Kriss <[email protected]>
Signed-off-by: Steve Kriss <[email protected]>
54eb048
to
8e5feec
Compare
Probably worth making a release note for the version, though I don't think we currently have a process for doing so til release time. |
pkg/cmd/cli/backup/delete.go
Outdated
cmd.CheckError(err) | ||
|
||
if !serverVersion.AtLeast(controller.MinVersionForDelete) { | ||
cmd.CheckError(fmt.Errorf("this command requires the Kubernetes server version to be at least %s", controller.MinVersionForDelete)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to do errors.Errorf
for consistency with the rest of the code base? (Apparently we're still using it in pkg/restore/restore.go when adding warnings & errors to the restore result.)
1 minor q |
Signed-off-by: Steve Kriss <[email protected]>
bf25e54
to
1c97478
Compare
sure, updated. |
@ncdc did you have anything further here? this is ready to merge from my perspective. |
Nope, merging |
…ore-priority Add vsb to restore priority
Signed-off-by: Steve Kriss [email protected]
Fixes #212
Fixes #77
@ncdc PTAL and provide feedback before I fix unit tests. I've manually tested successfully.